This notebook is Part 1 of the dataset enrichment notebook series where we utilize various zero-shot models to enrich datasets.

Part 1 - Dataset Enrichment with Zero-Shot Classification Models
Part 2 - Dataset Enrichment with Zero-Shot Detection Models
Part 3 - Dataset Enrichment with Zero-Shot Segmentation Models

👍 Purpose This notebook shows how to enrich your image dataset using labels generated with open-source zero-shot image classification (or image tagging) models such as Recognize Anything (RAM) and Tag2Text. By the end of the notebook, you’ll learn how to:

Install and load the RAM and Tag2Text models in fastdup.

Enrich the your dataset using labels generated by RAM and Tag2Text model.

Run inference using RAM and Tag2Text model on a single image.

Installation

First, let’s install the necessary packages:

fastdup - To analyze issues in the dataset.
Recognize Anything - To use the RAM and Tag2Text model.
gdown - To download demo data hosted on Google Drive.

Run the following to install all the above packages.

pip install -Uq fastdup git+https://github.com/xinyu1205/recognize-anything.git@119a7ae42fb2ce75459cd9107b353bc508460023 gdown

Test the installation. If there’s no error message, we are ready to go.

import fastdup
fastdup.__version__

'1.57'

🚧 CUDA Runtime fastdup runs perfectly on CPUs, but larger models like RAM and Tag2Text runs much slower on CPU compared to GPU. This codes in this notebook can be run on CPU or GPU. But, we highly recommend running in CUDA-enabled environment to reduce the run time. Running this notebook in Google Colab or Kaggle is a good start!

Download Dataset

Download the coco-minitrain dataset - A curated mini-training set consisting of 20% of COCO 2017 training dataset. The coco-minitrain consists of 25,000 images and annotations.

First, let’s load the dataset from the coco-minitrain dataset.

gdown --fuzzy https://drive.google.com/file/d/1iSXVTlkV1_DhdYpVDqsjlT4NJFQ7OkyK/view
unzip -qq coco_minitrain_25k.zip

Inference with RAM and Tag2Text

Within fastdup you can readily use the zero-shot image tagging models such as Recognize Anything Model (RAM) and Tag2Text. Both Tag2Text and RAM exhibit strong recognition ability.

RAM is an image tagging model, which can recognize any common category with high accuracy. Outperforms CLIP and BLIP.
Tag2Text is a vision-language model guided by tagging, which can support caption, retrieval, and tagging.

1. Inference on a bulk of images

To run inference on the downloaded dataset, you first need to load the image paths into a DataFrame.

import pandas as pd
from fastdup.utils import get_images_from_path

fd = fastdup.create(input_dir='./coco_minitrain_25k')
filenames = get_images_from_path(fd.input_dir)

df = pd.DataFrame(filenames, columns=["filename"])

Here’s a DataFrame with images loaded from the folder. Running zero-shot image tagging on the DataFrame is as easy as:

NUM_ROWS_TO_ENRICH = 10                          # for demonstration, only run on 10 rows only. 

df = fd.enrich(task='zero-shot-classification',
               model='recognize-anything-model', # specify model
               input_df=df,                      # the DataFrame of image files to enrich.
               input_col='filename',             # the name of the filename column.
               num_rows=NUM_ROWS_TO_ENRICH       # number of rows in the DataFrame to enrich. Optional.
     )

📘 More on fd.enrich Enriches an input DataFrame by applying a specified model to perform a specific task. Currently supports the following parameters:

As a result of running fd.enrich, an additional column 'ram_tags' is appended into the DataFrame listing all the relevant tags for the corresponding image. Let’s plot the results of the enrichment to see the tags and captions given by the RAM and Tag2Text models.

import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image

# Iterate over each row in the dataframe
for index, row in df.iterrows():
    filename = row['filename']
    ram_labels = row['ram_tags']
    tag2text_labels = row['tag2text_tags']
    tag2text_caption = row['tag2text_caption']
    
    # Read the image using PIL
    image = Image.open(filename)
    
    # Plot the image
    plt.imshow(image)
    plt.title(f"RAM Tags - [{ram_labels}]\n\nTag2Text Tags - [{tag2text_labels}]\n\nTag2Text Caption - [{tag2text_caption}]\n", wrap=True)
    plt.axis('off')
    
    plt.show()
    plt.close()

2. Inference on a single image

We can use these models in fastdup in a few lines of code. Let’s suppose we’d like to run an inference on the following image.

from IPython.display import Image
Image("coco_minitrain_25k/images/val2017/000000181796.jpg")

We can just import the RecognizeAnythingModel and run an inference.

from fastdup.models_ram import RecognizeAnythingModel

model = RecognizeAnythingModel()
result = model.run_inference("coco_minitrain_25k/images/val2017/000000181796.jpg")

Let’s inspect the results.

print(result)

bean . cup . table . dinning table . plate . food . fork . fruit . wine . meal . meat . peak . platter . potato . silverware . utensil . vegetable . white . wine glass

👍 Tip As shown above, the model outputs all associated tags with the query image. But what if you have a collection of images and would like to run zero-shot classification on all of them? fastdup provides a convenient fd.enrich API to for convenience.

Wrap Up

In this tutorial, we showed how you can run zero-shot image classification (or image tagging) models to enrich your dataset. This notebook is Part 1 of the dataset enrichment notebook series where we utilize various zero-shot models to enrich datasets.

Part 1 - Dataset Enrichment with Zero-Shot Classification Models
Part 2 - Dataset Enrichment with Zero-Shot Detection Models
Part 3 - Dataset Enrichment with Zero-Shot Segmentation Models

👍 Next Up Try out the Google Colab and Kaggle notebook to reproduce this example. Also, check out Part 2 of the series where we explore how to generate bounding boxes from the tags using zero-shot detection models like Grounding DINO. See you there!

Questions about this tutorial? Reach out to us on our Slack channel!

VL Profiler - A faster and easier way to diagnose and visualize dataset issues

The team behind fastdup also recently launched VL Profiler, a no-code cloud-based platform that lets you leverage fastdup in the browser. VL Profiler lets you find:

Duplicates/near-duplicates.
Outliers.
Mislabels.
Non-useful images.

Here’s a highlight of the issues found in the RVL-CDIP test dataset on the VL Profiler.

👍 Free Usage Use VL Profiler for free to analyze issues on your dataset with up to 1,000,000 images. Get started for free.

Not convinced yet? Interact with a collection of datasets like ImageNet-21K, COCO, and DeepFashion here. No sign-ups needed.

Introduction

Quick Start

Explore & Search

Collab & Downstream

Models & Enrichment

Advanced Creation & Management

Integrations

Troubleshooting

Metadata Enrichment with Zero-Shot Classification Models

Installation

Download Dataset

Inference with RAM and Tag2Text

1. Inference on a bulk of images

2. Inference on a single image

Wrap Up

VL Profiler - A faster and easier way to diagnose and visualize dataset issues

Introduction

Quick Start

Explore & Search

Collab & Downstream

Models & Enrichment

Advanced Creation & Management

Integrations

Troubleshooting

​Installation

​Download Dataset

​Inference with RAM and Tag2Text

​1. Inference on a bulk of images

​2. Inference on a single image

​Wrap Up

​VL Profiler - A faster and easier way to diagnose and visualize dataset issues

Installation

Download Dataset

Inference with RAM and Tag2Text

1. Inference on a bulk of images

2. Inference on a single image

Wrap Up

VL Profiler - A faster and easier way to diagnose and visualize dataset issues